A Programming Interface for NUMA Shared-Memory Clusters

نویسندگان

  • Marcus Dormanns
  • Walter Sprangers
  • Hubert Ertl
  • Thomas Bemmerl
چکیده

We describe a programming interface for parallel computing on NUMA (NonUniform Memory Access) shared memory machines. Although the interest in this architecture is rapidly growing and more and more hardware manufacturers offer products of this type, there is still a lack in parallelization support. We developed SMI, the Shared Memory Interface, and implemented it as a library on an SCI-coupled cluster of workstations. It aims at providing sophisticated support to account for the NUMA performance characteristic and to allow a step-by-step parallelization. We show it’s application to the parallelization of a sparse matrix computation.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Experiments with Cholesky Factorization on Clusters of SMPs

Cholesky factorization of large dense matrices is an integral part of many applications in science and engineering. In this paper we report on experiments with different parallel versions of Cholesky factorization on modern high-performance computing architectures. For the parallelization of Cholesky factorization we utilized various standard linear algebra software packages and present perform...

متن کامل

MPC: A Unified Parallel Runtime for Clusters of NUMA Machines

Over the last decade, Message Passing Interface (MPI) has become a very successful parallel programming environment for distributed memory architectures such as clusters. However, the architecture of cluster node is currently evolving from small symmetric shared memory multiprocessors towards massively multicore, Non-Uniform Memory Access (NUMA) hardware. Although regular MPI implementations ar...

متن کامل

Implementing Transparent Shared Memory on Clusters Using Virtual Machines

Shared memory systems, such as SMP and ccNUMA topologies, simplify programming and administration. On the other hand, clusters of individual workstations are commonly used due to cost and scalability considerations. We have developed a virtual-machine-based solution, dubbed vNUMA, that seeks to provide a NUMA-like environment on a commodity cluster, with a single operating system instance and t...

متن کامل

Flexible Operating System Support for Sci Clusters ?

The bottleneck for many parallel and distributed applications on networks of workstations is the high cost of communication on traditional network interfaces. Memory-mapped network interfaces provide latencies of a few microseconds and bandwidths close to the maximum of the local I/O bus. Data is transferred directly between memories without involving the operating system, thereby inducing very...

متن کامل

OpenMP performance analysis for many-core platforms with non-uniform memory access

One of the first steps in embedded-system design flow is to choose the most efficient implementation of the embedded software application. However, this is difficult to do at the earliest design stages because particular details of the final manycore HW platform are usually unknown and many possible mappings of the software tasks/threads have to be evaluated. This paper presents a complete fram...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1997